2. Exploratory data analysis

##2.1 Data structure and summary

#> 'data.frame':    1000 obs. of  32 variables:
#>  $ OBS.            : int  1 2 3 4 5 6 7 8 9 10 ...
#>  $ CHK_ACCT        : Factor w/ 4 levels "0","1","2","3": 1 2 4 1 1 ..
#>  $ DURATION        : int  6 48 12 42 24 36 24 36 12 30 ...
#>  $ HISTORY         : Factor w/ 5 levels "0","1","2","3",..: 5 3 5 3..
#>  $ NEW_CAR         : Factor w/ 2 levels "0","1": 1 1 1 1 2 1 1 1 1 ..
#>  $ USED_CAR        : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 2 1 ..
#>  $ FURNITURE       : Factor w/ 2 levels "0","1": 1 1 1 2 1 1 2 1 1 ..
#>  $ RADIO.TV        : Factor w/ 2 levels "0","1": 2 2 1 1 1 1 1 1 2 ..
#>  $ EDUCATION       : Factor w/ 3 levels "-1","0","1": 2 2 3 2 2 3 2..
#>  $ RETRAINING      : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 ..
#>  $ AMOUNT          : int  1169 5951 2096 7882 4870 9055 2835 6948 3..
#>  $ SAV_ACCT        : Factor w/ 5 levels "0","1","2","3",..: 5 1 1 1..
#>  $ EMPLOYMENT      : Factor w/ 5 levels "0","1","2","3",..: 5 3 4 4..
#>  $ INSTALL_RATE    : int  4 2 2 2 3 2 3 2 2 4 ...
#>  $ MALE_DIV        : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 2 ..
#>  $ MALE_SINGLE     : Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 1 ..
#>  $ MALE_MAR_or_WID : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 ..
#>  $ CO.APPLICANT    : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 ..
#>  $ GUARANTOR       : Factor w/ 3 levels "0","1","2": 1 1 1 2 1 1 1 ..
#>  $ PRESENT_RESIDENT: Factor w/ 4 levels "1","2","3","4": 4 2 3 4 4 ..
#>  $ REAL_ESTATE     : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 2 ..
#>  $ PROP_UNKN_NONE  : Factor w/ 2 levels "0","1": 1 1 1 1 2 2 1 1 1 ..
#>  $ AGE             : int  67 22 49 45 53 35 53 35 61 28 ...
#>  $ OTHER_INSTALL   : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 ..
#>  $ RENT            : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 2 1 ..
#>  $ OWN_RES         : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 2 1 2 ..
#>  $ NUM_CREDITS     : int  2 1 1 1 2 1 1 1 1 2 ...
#>  $ JOB             : Factor w/ 4 levels "0","1","2","3": 3 3 2 3 3 ..
#>  $ NUM_DEPENDENTS  : int  1 1 2 2 2 2 1 1 1 1 ...
#>  $ TELEPHONE       : Factor w/ 2 levels "0","1": 2 1 1 1 1 2 1 2 1 ..
#>  $ FOREIGN         : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 ..
#>  $ RESPONSE        : Factor w/ 2 levels "0","1": 2 1 2 2 1 2 2 2 2 ..
#>       OBS.      CHK_ACCT    DURATION    HISTORY NEW_CAR USED_CAR
#>  Min.   :   1   0:274    Min.   : 4.0   0: 40   0:766   0:897   
#>  1st Qu.: 251   1:269    1st Qu.:12.0   1: 49   1:234   1:103   
#>  Median : 500   2: 63    Median :18.0   2:530                   
#>  Mean   : 500   3:394    Mean   :20.9   3: 88                   
#>  3rd Qu.: 750            3rd Qu.:24.0   4:293                   
#>  Max.   :1000            Max.   :72.0                           
#>  FURNITURE RADIO.TV EDUCATION RETRAINING     AMOUNT      SAV_ACCT
#>  0:819     0:720    -1:  1    0:903      Min.   :  250   0:603   
#>  1:181     1:280    0 :950    1: 97      1st Qu.: 1366   1:103   
#>                     1 : 49               Median : 2320   2: 63   
#>                                          Mean   : 3271   3: 48   
#>                                          3rd Qu.: 3972   4:183   
#>                                          Max.   :18424           
#>  EMPLOYMENT  INSTALL_RATE  MALE_DIV MALE_SINGLE MALE_MAR_or_WID
#>  0: 62      Min.   :1.00   0:950    0:452       0:908          
#>  1:172      1st Qu.:2.00   1: 50    1:548       1: 92          
#>  2:339      Median :3.00                                       
#>  3:174      Mean   :2.97                                       
#>  4:253      3rd Qu.:4.00                                       
#>             Max.   :4.00                                       
#>  CO.APPLICANT GUARANTOR PRESENT_RESIDENT REAL_ESTATE PROP_UNKN_NONE
#>  0:959        0:948     1:130            0:718       0:846         
#>  1: 41        1: 51     2:308            1:282       1:154         
#>               2:  1     3:149                                      
#>                         4:413                                      
#>                                                                    
#>                                                                    
#>       AGE        OTHER_INSTALL RENT    OWN_RES  NUM_CREDITS  
#>  Min.   : 19.0   0:814         0:821   0:287   Min.   :1.00  
#>  1st Qu.: 27.0   1:186         1:179   1:713   1st Qu.:1.00  
#>  Median : 33.0                                 Median :1.00  
#>  Mean   : 35.6                                 Mean   :1.41  
#>  3rd Qu.: 42.0                                 3rd Qu.:2.00  
#>  Max.   :125.0                                 Max.   :4.00  
#>  JOB     NUM_DEPENDENTS TELEPHONE FOREIGN RESPONSE
#>  0: 22   Min.   :1.00   0:596     0:963   0:300   
#>  1:200   1st Qu.:1.00   1:404     1: 37   1:700   
#>  2:630   Median :1.00                             
#>  3:148   Mean   :1.16                             
#>          3rd Qu.:1.00                             
#>          Max.   :2.00

Data Frame Summary

Credit

Dimensions: 1000 x 32
Duplicates: 0
No Variable Stats / Values Freqs (% of Valid) Graph Valid Missing
1 OBS. [integer]
Mean (sd) : 500 (289)
min ≤ med ≤ max:
1 ≤ 500 ≤ 1000
IQR (CV) : 500 (0.6)
1000 distinct values (Integer sequence) 1000 (100.0%) 0 (0.0%)
2 CHK_ACCT [factor]
1. 0
2. 1
3. 2
4. 3
274(27.4%)
269(26.9%)
63(6.3%)
394(39.4%)
1000 (100.0%) 0 (0.0%)
3 DURATION [integer]
Mean (sd) : 20.9 (12.1)
min ≤ med ≤ max:
4 ≤ 18 ≤ 72
IQR (CV) : 12 (0.6)
33 distinct values 1000 (100.0%) 0 (0.0%)
4 HISTORY [factor]
1. 0
2. 1
3. 2
4. 3
5. 4
40(4.0%)
49(4.9%)
530(53.0%)
88(8.8%)
293(29.3%)
1000 (100.0%) 0 (0.0%)
5 NEW_CAR [factor]
1. 0
2. 1
766(76.6%)
234(23.4%)
1000 (100.0%) 0 (0.0%)
6 USED_CAR [factor]
1. 0
2. 1
897(89.7%)
103(10.3%)
1000 (100.0%) 0 (0.0%)
7 FURNITURE [factor]
1. 0
2. 1
819(81.9%)
181(18.1%)
1000 (100.0%) 0 (0.0%)
8 RADIO.TV [factor]
1. 0
2. 1
720(72.0%)
280(28.0%)
1000 (100.0%) 0 (0.0%)
9 EDUCATION [factor]
1. -1
2. 0
3. 1
1(0.1%)
950(95.0%)
49(4.9%)
1000 (100.0%) 0 (0.0%)
10 RETRAINING [factor]
1. 0
2. 1
903(90.3%)
97(9.7%)
1000 (100.0%) 0 (0.0%)
11 AMOUNT [integer]
Mean (sd) : 3271 (2823)
min ≤ med ≤ max:
250 ≤ 2320 ≤ 18424
IQR (CV) : 2607 (0.9)
921 distinct values 1000 (100.0%) 0 (0.0%)
12 SAV_ACCT [factor]
1. 0
2. 1
3. 2
4. 3
5. 4
603(60.3%)
103(10.3%)
63(6.3%)
48(4.8%)
183(18.3%)
1000 (100.0%) 0 (0.0%)
13 EMPLOYMENT [factor]
1. 0
2. 1
3. 2
4. 3
5. 4
62(6.2%)
172(17.2%)
339(33.9%)
174(17.4%)
253(25.3%)
1000 (100.0%) 0 (0.0%)
14 INSTALL_RATE [integer]
Mean (sd) : 3 (1.1)
min ≤ med ≤ max:
1 ≤ 3 ≤ 4
IQR (CV) : 2 (0.4)
1:136(13.6%)
2:231(23.1%)
3:157(15.7%)
4:476(47.6%)
1000 (100.0%) 0 (0.0%)
15 MALE_DIV [factor]
1. 0
2. 1
950(95.0%)
50(5.0%)
1000 (100.0%) 0 (0.0%)
16 MALE_SINGLE [factor]
1. 0
2. 1
452(45.2%)
548(54.8%)
1000 (100.0%) 0 (0.0%)
17 MALE_MAR_or_WID [factor]
1. 0
2. 1
908(90.8%)
92(9.2%)
1000 (100.0%) 0 (0.0%)
18 CO.APPLICANT [factor]
1. 0
2. 1
959(95.9%)
41(4.1%)
1000 (100.0%) 0 (0.0%)
19 GUARANTOR [factor]
1. 0
2. 1
3. 2
948(94.8%)
51(5.1%)
1(0.1%)
1000 (100.0%) 0 (0.0%)
20 PRESENT_RESIDENT [factor]
1. 1
2. 2
3. 3
4. 4
130(13.0%)
308(30.8%)
149(14.9%)
413(41.3%)
1000 (100.0%) 0 (0.0%)
21 REAL_ESTATE [factor]
1. 0
2. 1
718(71.8%)
282(28.2%)
1000 (100.0%) 0 (0.0%)
22 PROP_UNKN_NONE [factor]
1. 0
2. 1
846(84.6%)
154(15.4%)
1000 (100.0%) 0 (0.0%)
23 AGE [integer]
Mean (sd) : 35.6 (11.7)
min ≤ med ≤ max:
19 ≤ 33 ≤ 125
IQR (CV) : 15 (0.3)
54 distinct values 1000 (100.0%) 0 (0.0%)
24 OTHER_INSTALL [factor]
1. 0
2. 1
814(81.4%)
186(18.6%)
1000 (100.0%) 0 (0.0%)
25 RENT [factor]
1. 0
2. 1
821(82.1%)
179(17.9%)
1000 (100.0%) 0 (0.0%)
26 OWN_RES [factor]
1. 0
2. 1
287(28.7%)
713(71.3%)
1000 (100.0%) 0 (0.0%)
27 NUM_CREDITS [integer]
Mean (sd) : 1.4 (0.6)
min ≤ med ≤ max:
1 ≤ 1 ≤ 4
IQR (CV) : 1 (0.4)
1:633(63.3%)
2:333(33.3%)
3:28(2.8%)
4:6(0.6%)
1000 (100.0%) 0 (0.0%)
28 JOB [factor]
1. 0
2. 1
3. 2
4. 3
22(2.2%)
200(20.0%)
630(63.0%)
148(14.8%)
1000 (100.0%) 0 (0.0%)
29 NUM_DEPENDENTS [integer]
Min : 1
Mean : 1.2
Max : 2
1:845(84.5%)
2:155(15.5%)
1000 (100.0%) 0 (0.0%)
30 TELEPHONE [factor]
1. 0
2. 1
596(59.6%)
404(40.4%)
1000 (100.0%) 0 (0.0%)
31 FOREIGN [factor]
1. 0
2. 1
963(96.3%)
37(3.7%)
1000 (100.0%) 0 (0.0%)
32 RESPONSE [factor]
1. 0
2. 1
300(30.0%)
700(70.0%)
1000 (100.0%) 0 (0.0%)

Generated by summarytools 1.0.0 (R version 4.1.2)
2022-04-28

##2.2 Graphs for each variable

###2.2.1 Graph for response

###2.2.2 Graphs for all numeric variables

###2.2.3 Graphs for the purpose of credit variables

###2.2.4 Graphs for other categorical variables
###2.2.4 Variables in response

###2.2.5 Correlation plot

#3 Data preparation

##3.1 Balancing data

#> 
#>   0   1 
#> 240 560
#> 
#>   0   1 
#> 560 560

##3.2 Cross validation

#> CART 
#> 
#> 1120 samples
#>   31 predictor
#>    2 classes: '0', '1' 
#> 
#> No pre-processing
#> Resampling: Cross-Validated (10 fold) 
#> Summary of sample sizes: 1008, 1008, 1008, 1008, 1008, 1008, ... 
#> Resampling results across tuning parameters:
#> 
#>   cp      Accuracy  Kappa 
#>   0.0223  0.653     0.3054
#>   0.0357  0.641     0.2821
#>   0.3071  0.546     0.0911
#> 
#> Accuracy was used to select the optimal model using the
#>  largest value.
#> The final value used for the model was cp = 0.0223.
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction  0  1
#>          0 42 47
#>          1 18 93
#>                                         
#>                Accuracy : 0.675         
#>                  95% CI : (0.605, 0.739)
#>     No Information Rate : 0.7           
#>     P-Value [Acc > NIR] : 0.802764      
#>                                         
#>                   Kappa : 0.32          
#>                                         
#>  Mcnemar's Test P-Value : 0.000515      
#>                                         
#>             Sensitivity : 0.700         
#>             Specificity : 0.664         
#>          Pos Pred Value : 0.472         
#>          Neg Pred Value : 0.838         
#>              Prevalence : 0.300         
#>          Detection Rate : 0.210         
#>    Detection Prevalence : 0.445         
#>       Balanced Accuracy : 0.682         
#>                                         
#>        'Positive' Class : 0             
#>